Search CORE

22 research outputs found

Context Vectors are Reflections of Word Vectors in Half the Dimensions

Author: Assylbekov Zhenisbek
Takhanov Rustem
Publication venue
Publication date: 26/02/2019
Field of study

This paper takes a step towards theoretical analysis of the relationship between word embeddings and context embeddings in models such as word2vec. We start from basic probabilistic assumptions on the nature of word vectors, context vectors, and text generation. These assumptions are well supported either empirically or theoretically by the existing literature. Next, we show that under these assumptions the widely-used word-word PMI matrix is approximately a random symmetric Gaussian ensemble. This, in turn, implies that context vectors are reflections of word vectors in approximately half the dimensions. As a direct application of our result, we suggest a theoretically grounded way of tying weights in the SGNS model

arXiv.org e-Print Archive

Crossref

Nazarbayev University Repository

Syllable-aware Neural Language Models: A Failure to Beat Character-aware Ones

Author: Assylbekov Zhenisbek
Myrzakhmetov Bagdat
Takhanov Rustem
Washington Jonathan N.
Publication venue
Publication date: 01/01/2017
Field of study

Syllabification does not seem to improve word-level RNN language modeling quality when compared to character-based segmentation. However, our best syllable-aware language model, achieving performance comparable to the competitive character-aware model, has 18%-33% fewer parameters and is trained 1.2-2.2 times faster.Comment: EMNLP 201

arXiv.org e-Print Archive

Works

Experiments with Russian to Kazakh sentence alignment

Author: Assylbekov Zhenisbek
Makazhanov Aibek
Myrzakhmetov Bagdat
Publication venue: The 4-th International Conference on Computer Processing of Turkic Languages “TurkLang 2016”
Publication date: 01/01/2016
Field of study

Sentence alignment is the final step in building parallel corpora, which arguably has the greatest impact on the quality of a resulting corpus and the accuracy of machine translation systems that use it for training. However, the quality of sentence alignment itself depends on a number of factors. In this paper we investigate the impact of several data processing techniques on the quality of sentence alignment. We develop and use a number of automatic evaluation metrics, and provide empirical evidence that application of all of the considered data processing techniques yields bitexts with the lowest ratio of noise and the highest ratio of parallel sentences

Nazarbayev University Repository

Context Vectors Are Reflections of Word Vectors in Half the Dimensions

Author: Assylbekov Zhenisbek
Takhanov Rustem
Publication venue: AI ACCESS FOUNDATION
Publication date: 01/09/2019
Field of study

https://arxiv.org/pdf/1902.09859.pdfThis paper takes a step towards theoretical analysis of the relationship between word embeddings and context embeddings in models such as word2vec. We start from basic probabilistic assumptions on the nature of word vectors, context vectors, and text generation. These assumptions are supported either empirically or theoretically by the existing literature. Next, we show that under these assumptions the widely-used word-word PMI matrix is approximately a random symmetric Gaussian ensemble. This, in turn, implies that context vectors are reflections of word vectors in approximately half the dimensions. As a direct application of our result, we suggest a theoretically grounded way of tying weights in the SGNS model

Nazarbayev University Repository

Gradient Descent Fails to Learn High-frequency Functions and Modular Arithmetic

Author: Assylbekov Zhenisbek
Bolatov Arman
Pak Artur
Takhanov Rustem
Tezekbayev Maxat
Publication venue
Publication date: 19/10/2023
Field of study

Classes of target functions containing a large number of approximately orthogonal elements are known to be hard to learn by the Statistical Query algorithms. Recently this classical fact re-emerged in a theory of gradient-based optimization of neural networks. In the novel framework, the hardness of a class is usually quantified by the variance of the gradient with respect to a random choice of a target function. A set of functions of the form

x\to ax \bmod p

, where

a

is taken from

{\mathbb Z}_p

, has attracted some attention from deep learning theorists and cryptographers recently. This class can be understood as a subset of

p

-periodic functions on

{\mathbb Z}

and is tightly connected with a class of high-frequency periodic functions on the real line. We present a mathematical analysis of limitations and challenges associated with using gradient-based learning techniques to train a high-frequency periodic function or modular multiplication from examples. We highlight that the variance of the gradient is negligibly small in both cases when either a frequency or the prime base

p

is large. This in turn prevents such a learning algorithm from being successful

arXiv.org e-Print Archive

Long-Tail Theory under Gaussian Mixtures

Author: Assylbekov Zhenisbek
Bolatov Arman
Melnykov Igor
Nikoulina Vassilina
Pak Artur
Tezekbayev Maxat
Publication venue
Publication date: 24/07/2023
Field of study

We suggest a simple Gaussian mixture model for data generation that complies with Feldman's long tail theory (2020). We demonstrate that a linear classifier cannot decrease the generalization error below a certain level in the proposed model, whereas a nonlinear classifier with a memorization capacity can. This confirms that for long-tailed distributions, rare training examples must be considered for optimal generalization to new data. Finally, we show that the performance gap between linear and nonlinear models can be lessened as the tail becomes shorter in the subpopulation frequency distribution, as confirmed by experiments on synthetic and real data.Comment: accepted to ECAI 202

arXiv.org e-Print Archive